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network;  the  sorting  algorithm  of  [BH-82]  will  suffice.    The  advantage  of  the  latter 
sorting  algorithm  is  that  the  constants  in  the  running  time  are  much  smaller. 

The  previous  best  result  for  selection  on  the  PRAM  was  the  following:  0(n) 
operations  in  time  C>(lognloglogn)  [Vi-83a].  Much  previous  work  on  parallel 
algorithms  for  the  selection  problem  has  concentrated  on  Valiant's  comparison 
model  [V-75].  In  this  model,  only  the  time  used  for  comparisons  is  counted.  The 
following  results  have  been  achieved:  with  n  processors,  a  lower  bound  of 
n(loglogn)  time  P/-75],  an  upper  bound  of  0([loglogn]^)  time  [CY-85],  and  an 
upper  bound  of  C>(loglogn)  time  [KSS-86].  Also,  in  this  model,  [R-81]  gave  an  0(1) 
time  probabilistic  algorithm  for  n  processors. 

Our  algorithm,  like  the  algorithm  of  [CY-85],  uses  a  procedure  for  computing 
approximations  to  the  k-th  smallest  item.  The  selection  algorithm  itself  is  described 
in  section  2.  We  give  the  approximation  algorithm  in  section  3.  In  order  to 
simplify  the  presentation  we  assume  that  all  the  items  are  distinct.  Also,  we  ignore 
all  rounding  errors. 

2.   The  selection  algorithm 

We  use  the  following  notation:  log^'^  denotes  the  z'-th  iteration  of  the  log 
function.  We  define  log*n  to  be  the  least  i  such  that  log^'^n^l.  Also,  we  introduce 
the  following  function  /:  /(1)  =  1,  and  f{i+l)  =  2^^'K  for  i>l.  We  note  that 
logn^/(log*/i)<n,  for /I ^2. 

We  use  an  approximate  selection  algorithm  with  the  following  input/output 
relation,  for  z^2. 

Input:  A  set  of  n/[f(i)]    items  in  an  array,  and  an  integer  k. 

Output:  Two  items,  one  smaller  than  the  Jt-th  input  item,  and  one  larger  than  the  k- 

th  input  item.    We  also  specify  the  accuracy  of  the  approximation.   The  smaller  item 

is  at  least  the  [k rl-th  input  item,  while  the  larger  item  is  at  most  the 

2[f{i  +  l)]^ 

[k-\ T]-th  input  item. 

2[/(/+l)]2 

The  approximation  algorithm  requires  0{n/f{i))  operations  and  time  C>(log7i). 
The  selection  algorithm  has  the  following  form. 

1)  do  /=  1  to  \og*n-\ 

(i)  Compute  the  2  approximations  to  the  k-ih  smallest  item,  one  smaller  and 
one  larger  than  the  it-th  smallest  item, 
(ii)  Let  them  have  ranks  /tj  and  k2- 

(iii)  Extract  the  items  strictly  between  the  kyih,  and  kj-th.  items, 
(iv)  Place  the  extracted  items  into  an  array  of  size  «/[/(/+l)]^. 
{\)k:  =  k-k-^. 
od; 

2)  Sort  the  remaining  <;j/log^n  items  using  the  [AKS-83]  sorting  network;  the  k-th. 
item  is  the  item  sought. 

The  /-th  iteration  of  the  selection  algorithm  requires  0(n/f(i))  operations  and 
time  Oilogn).  The  final  sort  requires  0(n/]ogn)  operations  and  time  0(\ogn). 
Thus,  the  selection  algorithm  requires  0(n)  operations  and  time  C>(lognlog*/j). 
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ABSTRACT 

We  give  an  optimal  parallel  algorithm  for  selection  on  the  EREW 
PRAM.  It  requires  a  linear  number  of  operations  and  0(lognlog*n) 
time. 

1.   Introduction 

The  models  of  parallel  computation  used  in  this  paper  are  the  exclusive-read 
exclusive-write  (EREW),  and  the  concurrent-read  exclusive-write  (CREW)  parallel 
random  access  machine  (PRAM).  A  PRAM  employs  p  synchronous  processors  all 
having  access  to  a  common  memory.  An  EREW  PRAM  does  not  allow 
simultaneous  access  by  more  than  one  processor  to  the  same  memory  location  for 
read  or  write  purposes,  while  a  CREW  allows  concurrent  access  for  reads  but  not 
for  writes.    See  [Vi-83b]  for  a  survey  of  results  concerning  PRAMs. 

Let  Seq(n)  be  the  fastest  known  worst-case  running  time  of  a  sequential 
algorithm,  where  n  is  the  length  of  the  input  for  the  problem  at  hand.  Obviously, 
the  best  upper  bound  on  the  parallel  time  achievable  using  p  processors,  without 
improving  the  sequential  result,  is  of  the  form  0(Seg(n)/p).  A  parallel  algorithm 
that  achieves  this  running  time  is  said  to  have  optimal  speed-up  or  more  simply  to  be 
optimal.  A  primary  goal  in  parallel  computation  is  to  design  optimal  algorithms  that 
also  run  as  fast  as  possible.  The  following  notions  of  "fast"  and  "efficient"  are  in 
line  with  this  goal.  Given  a  parallel  algorithm  that  runs  in  time  T  using  p 
processors,  we  say  that  it  performs  (a  total  of)  pT  operations.  (Recall  that  a 
theorem  due  to  [B-74]  implies  that,  under  quite  general  conditions,  any  algorithm 
that  performs  x  operations  in  time  t  can  be  implemented  so  as  to  achieve  time  0(0 
using  x/t  processors.)  Let  A  (resp.  B)  be  a  parallel  algorithm  that  performs  xj 
operations  (resp.  X2  operations)  in  time  T^  (resp.  time  T2)  on  a  given  input.  We  say 
that  algorithm  A  is  more  efficient  than  algorithm  B  if  xi^X2  (regardless  of  the 
relation  between  T^  and  T2).  Also,  algorithm  A  is  faster  than  algorithm  B  if  Ti^T2 
(regardless  of  the  relation  between  xi  and  x-i). 

In  view  of  our  interest  in  efficiency  we  shall  give  all  our  results  in  terms  of  the 
following  two  parameters:  operations  performed  and  running  time.  To  obtain  the 
number  of  processors  used,  simply  divide  the  operation  count  by  the  running  time. 
This  presentation  has  two  advantages.  First  it  provides  for  a  comparison  'at  a 
glance'  with  the  best  sequential  algorithm.  Second,  it  may  make  the  description  of 
the  algorithm  simpler:  at  times  it  is  convenient  to  describe  the  algorithm  as  if 
different  numbers  of  processors  are  being  used  for  different  steps  of  the  algorithm, 
and  simply  to  count  the  number  of  operations  performed.  We  are  assured,  by 
Brent's  theorem  [B-74],  that  it  is  straightforward  to  simulate  such  an  algorithm  by  a 
uniform  number  of  processors.  We  remark  that  Brent's  theorem  ignores  the 
following  implementation  issue:  how  to  assign  operations  to  processors.  But  for  the 
algorithm  given  here  this  will  present  no  difficulty.  The  details  will  be  left  to  the 
reader. 

We  give  an  EREW  PRAM  algorithm  for  selection  that  uses  0{n)  operations 
and  0{\ogn\og'n)  time;  it  assumes  that  the  input  is  provided  in  an  array  of  length  n. 
The  algorithm  makes  use  of  the  [AKS-83]  sorting  network.  However,  if  the 
algorithm  is  implemented  on  a  CREW  PRAM  we  do  not  need  to  use  this  sorting 


We  apply  the  reduction  procedure  recursively  to  obtain  the  approximation;  let  e 
be  the  item  selected. 

Lemma  1:   Item  e  is  at  least  the  k-.  —  ■:. 7-th  item  in  5,-. 

2[/(0]  !/('■  + 1)] 

Proof:  We  prove  the  result  by  induction  on  y  =  log*n  — /.  Clearly,  the  base  case, 
;  =  0  (i  =  log*n),  is  true,  since  the  set  .^j  .„  is  sorted,  so  the  item  selected  is  the 
desired  item.  For  the  inductive  step,  suppose  that  e  is  at  least  the 
ki+^ :; r-th    item    in    5,  +  i.     Then,    in    Si,    e    is    at    least    the 

it;  —  ; =- ;r-th        item,        that        is,        at        least        the 

4I/(0]^[;^(/+l)]^         2[/(/  +  2)]2 

it;  —  :; T-th  item.    D 

2[/(0f[f(/+l)f 

Corollary  1:  Item  e  is  at  least  the  k; 7-th  item  in  5,-. 

21/(/+l)f 

Lemma  2:  Item  e  is  smaller  than  the  /:,-th  item  in  5,-,  for  i<log*/i. 

Proof:  This  follows  immediately  from  the  fact  that  the  ^,  +  rth  item  in  5,  +  i  is  less 
than  the  /:,-th  item  in  5,-.   □ 

We  note  that  the  approximation  algorithm  takes  C>(logrt)  time  and  performs 
0{n/f{i))  operations. 

Remark:  We  can  avoid  using  the  [AKS-83]  network,  by  replacing  it  with  Borodin 
and  Hopcroft's  parallel  sorting  algorithm,  which  runs  in  time  O(logn)  on  nlogn 
processors  in  the  CREW  PRAM  model.  In  the  selection  algorithm  above,  we 
replace  each  term  [/"(j)]^  by  a  term  [/(/)]'',  and  each  term  [/(/)]''  by  a  term  [/(/)]  . 
The  analysis  is  identical  (on  performing  the  textual  substitutions^there  also). 
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Remark  1:  We  can  achieve  a  runtime  of  0(h\ogn)  at  the  cost  of  using  0(n\og^''^n) 
operations,  as  follows.  We  use  essentially  the  algorithm  given  above;  however,  the 
first  iteration  of  the  loop  will  produce  approximations  to  the  k-th  item  with  an  error 

of        at         most r-.  This         is         achieved         by         using 

2[f(\og*n-h)]^ 
n\f(\og*n-h-l)r<n[\og^'''^^^n]^^n\og^''^n     processors     to     compute     the     initial 
approximations. 

Remark  2:  The  structure  of  the  selection  algorithm  is  very  similar  to  that  of  the  list 
ranking  algorithm  of  [CV-86a].  In  [CV-86b]  this  structure  is  called  an  accelerating 
cascade. 

3.  The  approximation  algorithm 

We  discuss  how  to  find  a  lower  approximation  to  the  k-th  item.  Finding  a 
higher  approximation  is  a  symmetric  operation.  Our  algorithm  will  use  0(n/f(i)) 
operations  and  time  (9(Iogn)  if  the  input  is  of  size  at  most  n/\f(i)]^;  the  error  in  the 

approximation  will  be  at  most  r.    (In  the  event  that  k^ r-,  we 

2[/(z  +  l)]2  2\f(i+l)f 

choose  the  lower  approximation  to  be  the  (non-existent)  0-th  item  with  a  value  of 
—  M.)  It  is  assumed  that  the  input  is  presented  in  an  array  of  size  equal  to  the  input 
size.  If  i  =  log*n  the  selection  is  achieved  by  sorting  the  input  using  the  [AKS-83] 
sorting  network.  For  smaller  values  of  i  we  proceed  as  follows.  We  use  a 
reduction  procedure  with  the  following  input/output  relation. 

Input:  A  set  5,-  of  size  at  most  n/\fii)]^,  and  an  integer  ^,-.    (A  lower  approximation 

to  the  k.-ih  item  in  5.-  is  being  sought;  the  error  is  to  be  at  most z t-.) 

Output:  A  set  5,+i  of  size  at  most  n/[f{i+l)f,  and  an  integer  k^+i.  (A  lower 
approximation  to  the  fc,  +  i-th  item  in  5,+i  is  to  be  found;  the  error  is  to  be  at  most 
n 


i2r^/;j.OM2 


.   This  approximation  will  also  be  a  lower  approximation  to  the 
2lfii+^)Y[f(i  +  2)r 

krth  item  in  5,.  with  error  at  most  2[f(i)f[f(i+l)f^ 

The  input  and  output  will  be  provided  in  arrays  of  size  |S',|  and  |5,+i|, 
respectively.  The  reduction  procedure  runs  in  time  0(/(z)),  and  uses  0(n/f(i)) 
operations. 

Method:  Form  sets  of  size  4[f(i  +  l)]  items  (take  contiguous  segments  of  items 
from  the  input  array).  Sort  each  set  using  an  [AKS-83]  sorting  network.  The 
output  set  comprises  every  I/(/+l)]  -th  item  in  each  sorted  set.  The  output  is 
provided  in  an  array:  namely  every  [/"(/+ 1)]  -th  item  in  the  input  array,  following 
the  sorting. 

Note  that  the  r-th  item  in  5,  +  i  is  at  least  the  r[f{i  +  l)f-th  item  in  5,-,  and  is  less 

than      the       rlf(i+\)f  +  4r/-(j)]2|y(/+ 1)]2 '^^      "^"^      i"      -^i-        We      choose 

''•^^  =  W^  ~   4[f(0]V(/+l)]^-     '°  '''  ''"^"''  ^''"  "  '-^  ''  ''  '''''  ^^^ 

t. -5 r--th  item  in  5,-,  and  is  less  than  the  krth  item  in  5- 

'        4[/(/)]2|/( '■+!)] 
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